Search CORE

21 research outputs found

A Form of List Viterbi Algorithm for Decoding Convolutional Codes

Author: Abdulrasheed Mustapha
Shamsuddeen Hassan Muhammad
Publication venue: 'University of Porto'
Publication date: 01/10/2018
Field of study

Viterbi algorithm is a maximum likelihood decoding algorithm. It is used to decode convolutional code in several wireless communication systems, including Wi-Fi. The standard Viterbi algorithm gives just one decoded output, which may be correct or incorrect. Incorrect packets are normally discarded thereby necessitating retransmission and hence resulting in considerable energy loss and delay. Some real-time applications such as Voice over Internet Protocol (VoIP) telephony do not tolerate excessive delay. This makes the conventional Viterbi decoding strategy sub-optimal. In this regard, a modified approach, which involves a form of List Viterbi for decoding the convolutional code is investigated. The technique employed combines the bit-error correction capabilities of both the Viterbi algorithm and the Cyclic Redundancy Check (CRC) procedures. It first uses a form of ‘List Viterbi Algorithm’ (LVA), which generates a list of possible decoded output candidates after the trellis search. The CRC check is then used to determine the presence of correct outcome. Results of experiments conducted using simulation shows considerable improvement in bit-error performance when compared to classical approach

Directory of Open Access Journals

HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria

Author: Abdulmumin Idris
Ahmad Ibrahim Said
Aliyu Saminu Mohammad
Muhammad Shamsuddeen Hassan
Murtala Muhammad
Wajiga Gregory Maksha
Publication venue
Publication date: 28/11/2022
Field of study

Social media platforms allow users to freely share their opinions about issues or anything they feel like. However, they also make it easier to spread hate and abusive content. The Fulani ethnic group has been the victim of this unfortunate phenomenon. This paper introduces the HERDPhobia - the first annotated hate speech dataset on Fulani herders in Nigeria - in three languages: English, Nigerian-Pidgin, and Hausa. We present a benchmark experiment using pre-trained languages models to classify the tweets as either hateful or non-hateful. Our experiment shows that the XML-T model provides better performance with 99.83% weighted F1. We released the dataset at https://github.com/hausanlp/HERDPhobia for further research.Comment: To appear in the Proceedings of the Sixth Workshop on Widening Natural Language Processing at EMNLP202

arXiv.org e-Print Archive

PERCEIVED CORRELATION BETWEEN COMMUNICATION STYLES AND INTERPERSONAL CONFLICT RESOLUTION AMONG INTERNATIONAL STUDENTS IN MALAYSIA

Author: Hassan Isyaku
Mohammed Shamsuddeen
Muhammad Umar Musa
Muhammed Mahmud Umar
Nasidi Qaribu Yahaya
Publication venue: Penerbit Universiti Sultan Zainal Abidin
Publication date: 30/06/2023
Field of study

Background and Purpose: A good and fulfilling relationship among individuals from distinct cultural backgrounds depends on effective communication. This research examined the perceived relationship between communication styles and interpersonal conflict resolution among international students in Malaysian universities.   Methodology: The study employed a cross-sectional survey in which self-developed structured questionnaires were used to gather data from a random sample of 324 international students in 15 higher institutions across Kuala Lumpur, Malaysia. The data were analyzed using multiple regression analysis.   Findings: The findings of this study revealed a significant positive relationship between communication styles and interpersonal conflict resolution among international students. Specifically, passive, passive-aggressive, and assertive communication styles have a significant positive relationship with conflict resolution. However, the aggressive communication style exerts an insignificant effect on conflict resolution with a t-value of 0.734 and a P-value of 0.463. Thus, the students generally believe this style does not help to resolve interpersonal conflict. These outcomes suggest the students’ readiness for cultivating a peaceful learning environment.   Contributions: This study provides relevant information that can help educational decision-makers to strengthen cross-cultural collaboration among international students in the Malaysian context. This valuable information can also facilitate successful academic, professional, and social cooperation.   Keywords: Cross-cultural relationship, interpersonal conflict resolution, communication styles, international students, Malaysia.   Cite as: Mohammed, S., Nasidi, Q. Y., Muhammed, M. U., Umar, M. M., & Hassan, I. (2023). Perceived correlation between communication styles and interpersonal conflict resolution among international students in Malaysia. Journal of Nusantara Studies, 8(2), 352-372. http://dx.doi.org/10.24200/jonus.vol8iss2pp352-37

Journal of Nusantara Studies (JONUS) (Journal of UniSZA - Universiti Sultan Zainal Abidin)

Deep Sequence Models for Text Classification Tasks

Author: Abdullahi Abdulkadir
Abdullahi Saheed Salahudeen
Aliyu Saminu Mohammad
Aminu Ahmad Muhammad
Bello Musa
Muhammad Shamsuddeen Hassan
Mustapha Abdulrasheed
Yiming Sun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/07/2022
Field of study

The exponential growth of data generated on the Internet in the current information age is a driving force for the digital economy. Extraction of information is the major value in an accumulated big data. Big data dependency on statistical analysis and hand-engineered rules machine learning algorithms are overwhelmed with vast complexities inherent in human languages. Natural Language Processing (NLP) is equipping machines to understand these human diverse and complicated languages. Text Classification is an NLP task which automatically identifies patterns based on predefined or undefined labeled sets. Common text classification application includes information retrieval, modeling news topic, theme extraction, sentiment analysis, and spam detection. In texts, some sequences of words depend on the previous or next word sequences to make full meaning; this is a challenging dependency task that requires the machine to be able to store some previous important information to impact future meaning. Sequence models such as RNN, GRU, and LSTM is a breakthrough for tasks with long-range dependencies. As such, we applied these models to Binary and Multi-class classification. Results generated were excellent with most of the models performing within the range of 80% and 94%. However, this result is not exhaustive as we believe there is room for improvement if machines are to compete with humans

arXiv.org e-Print Archive

Semi-automatic approaches for exploiting shifter patterns in domain-specific sentiment analysis

Author: Brazdil Pavel
Cordeiro João
Leal António
Muhammad Shamsuddeen Hassan
Oliveira Fátima
Silva Maria de Fátima Henriques da
Silvano Maria da Purificação
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

This paper describes two different approaches to sentiment analysis. The first is a form of symbolic approach that exploits a sentiment lexicon together with a set of shifter patterns and rules. The sentiment lexicon includes single words (unigrams) and is developed automatically by exploiting labeled examples. The shifter patterns include intensification, attenuation/downtoning and inversion/reversal and are developed manually. The second approach exploits a deep neural network, which uses a pre-trained language model. Both approaches were applied to texts on economics and finance domains from newspapers in European Portuguese. We show that the symbolic approach achieves virtually the same performance as the deep neural network. In addition, the symbolic approach provides understandable explanations, and the acquired knowledge can be communicated to others. We release the shifter patterns to motivate future research in this direction

Repositório Aberto da Universidade do Porto

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

Author: Abdulmumin Idris
Ahmad Ibrahim Said
Bojar Ondřej
Bose Aneesh
Kakudi Habeebah Adamu
Kohli Guneet Singh
Kotwal Ketan
Muhammad Shamsuddeen Hassan
Parida Shantipriya
Sarkar Sayan Deb
Publication venue
Publication date: 28/05/2023
Field of study

This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.Comment: Accepted at ACL 2023 as a long paper (Findings

arXiv.org e-Print Archive

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

Africa is home to over 2000 languages from over six language families and has the highest linguistic diversity among all continents. This includes 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial in enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, which consists of 14 sentiment datasets of 110,000+ tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yor\`ub\'a) from four language families annotated by native speakers. The data is used in SemEval 2023 Task 12, the first Afro-centric SemEval shared task. We describe the data collection methodology, annotation process, and related challenges when curating each of the datasets. We conduct experiments with different sentiment classification baselines and discuss their usefulness. We hope AfriSenti enables new work on under-represented languages. The dataset is available at https://github.com/afrisenti-semeval/afrisent-semeval-2023 and can also be loaded as a huggingface datasets (https://huggingface.co/datasets/shmuhammad/AfriSenti).Comment: 15 pages, 6 Figures, 9 Table

arXiv.org e-Print Archive

Humoral immunological kinetics of severe acute respiratory syndrome coronavirus 2 infection and diagnostic performance of serological assays for coronavirus disease 2019: an analysis of global reports

Author: Abdullahi Idris Nasir
Adekola Hafeez Aderinsayo
Aliyu Dorcas
Animasaun Olawale Sunday
Anka Abubakar Umar
Awoniyi Luqman O
Bello Zakariyya Muhammad
Chiwar Hassan Musa
Emeribe Anthony Uchenna
Emeribe Chinenye Helen
Fasogbon Samuel Ayobami
Ghamba Peter Elisha
Gwarzo Abubakar Muhammad
Haruna Shamsuddeen
Muhammad Habiba Yahaya
Musa Bolanie OP
Musa Sanusi
Nwofe Justin Onyebuchi
Ogar Christopher
Okwume Chukwudi Crescent
Olayemi Lawal
Rogo Lawal Dahiru
Shuwa Halima Ali
Usman Yahaya
Uzairue Leonard
Publication venue: 'Oxford University Press (OUP)'
Publication date: 28/10/2022
Field of study

As the coronavirus disease 2019 (COVID-19) pandemic continues to rise and second waves are reported in some countries, serological test kits and strips are being considered to scale up an adequate laboratory response. This study provides an update on the kinetics of humoral immune response to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and performance characteristics of serological protocols (lateral flow assay [LFA], chemiluminescence immunoassay [CLIA] and ELISA) used for evaluations of recent and past SARS-CoV-2 infection. A thorough and comprehensive review of suitable and eligible full-text articles was performed on PubMed, Scopus, Web of Science, Wordometer and medRxiv from 10 January to 16 July 2020. These articles were searched using the Medical Subject Headings terms 'COVID-19', 'Serological assay', 'Laboratory Diagnosis', 'Performance characteristics', 'POCT', 'LFA', 'CLIA', 'ELISA' and 'SARS-CoV-2'. Data from original research articles on SARS-CoV-2 antibody detection >= second day postinfection were included in this study. In total, there were 7938 published articles on humoral immune response and laboratory diagnosis of COVID-19. Of these, 74 were included in this study. The detection, peak and decline period of blood anti-SARS-CoV-2 IgM, IgG and total antibodies for point-of-care testing (POCT), ELISA and CLIA vary widely. The most promising of these assays for POCT detected anti-SARS-CoV-2 at day 3 postinfection and peaked on the 15th day; ELISA products detected anti-SARS-CoV-2 IgM and IgG at days 2 and 6 then peaked on the eighth day; and the most promising CLIA product detected anti-SARS-CoV-2 at day 1 and peaked on the 30th day. The most promising LFA, ELISA and CLIA that had the best performance characteristics were those targeting total SARS-CoV-2 antibodies followed by those targeting anti-SARS-CoV-2 IgG then IgM. Essentially, the CLIA-based SARS-CoV-2 tests had the best performance characteristics, followed by ELISA then POCT. Given the varied performance characteristics of all the serological assays, there is a need to continuously improve their detection thresholds, as well as to monitor and re-evaluate their performances to assure their significance and applicability for COVID-19 clinical and epidemiological purposes

UTUPub

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Author: Adeyemi Mofetoluwa
Agrawal Sweta
Ahia Oghenefego
Ahia Orevaoghene
Ataman Duygu
Awokoya Ayodele
Azime Israel Abebe
Baljekar Pallavi
Ballı Sakine Çabuk
Bapna Ankur
Baruwa Ahmed
Battisti Alessia
Biderman Stella
Caswell Isaac
de Silva Nisansa
Dlamini Sakhile
Dossou Bonaventure F. P.
Firat Orhan
Jenny Mathias
Jernite Yacine
Kreutzer Julia
Kudugunta Sneha
Lawson Nze
Leong Colin
Matangira Tapiwanashe
Mirzakhalov Jamshidbek
Mnyakeni Ayanda
Muhammad Nanda
Muhammad Shamsuddeen Hassan
Müller André
Müller Mathias
Nguyen Toan Q.
Ogueji Kelechi
Orife Iroro
Osei Salomey
Papadimitriou Isabel
Rios Annette
Rivera Clara
Rubungo Andre Niyongabo
Sagot Benoît
Samb Sokhar
Sarin Supheakmungkol
Setyawan Monang
Sikasote Claytone
Sokolov Artem
Subramani Nishant
Suárez Pedro Ortiz
Tapo Allahsera
Ulzii-Orshikh Nasanbayar
van Esch Daan
Wahab Ahsan
Wang Lisa
Publication venue
Publication date: 23/03/2021
Field of study

With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases.Comment: Accepted at TACL; pre-MIT Press publication versio

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

MasakhaNEWS: News Topic Classification for African languages

Author: Ababu Teshome Mulugeta
Abdulganiyu Habiba
Abdulmumin Idris
Adeeko Adetola
Adelani David Ifeoluwa
Adelani Tolulope
Afolabi Abeeb
Ajayi Tunde
al-azzawi sana
Alabi Jesujoba
Aremu Anuoluwapo
Awosan Oyinkansola
Awoyomi Oluwabusayo
Azime Israel Abebe
Chukwuneke Chiamaka
David Davis
Diko Thina
Dossou Bonaventure F. P.
Emezue Chris Chinenye
Fanijo Samuel
Gwadabe Tajuddeen
Hassan Fuad Mire
Johar Abdulmejid
Jules Jules
Kebede Tadesse
Kimanuka Ussen
Kimotho Wangari
Masiak Marek
Mbonu Chinedu
Mehamed Moges Ahmed
Mohamed Muhidin
Mohamed Shafie
Moteu Tatiana
Muhammad Shamsuddeen Hassan
Mukiibi Jonathan
Mwase Christine
Ndolela Lolwethu
Ngabire Evrard
Nigusse Sinodos
Nixdorf Doreen
Nxakama Siyanda
Nyatsine Pamela
Obiefuna Nnaemeka
Odhiambo Brian
Oduwole Mardiyyah
Ogbu Onyekachi
Ogundepo Odunayo
Ojo Jessica
Oladipo Akintunde
Omotayo Abdul-Hakeem
Owodunni Abraham
Sakayo Toadoum Sari
Salahudeen Saheed Abdullahi
Samuel Olanrewaju
Shode Iyanuoluwa
Sibanda Blessing
Sidume Freedmore
Siro Clemencia
Ssenkungu Ivan
Stenetorp Pontus
Taye Mahlet
Tonja Atnafu Lambebo
Tshinu Tshinu
Yigezu Mesay Gemeda
Yousuf Oreen
Publication venue
Publication date: 20/09/2023
Field of study

African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.Comment: Accepted to IJCNLP-AACL 2023 (main conference

arXiv.org e-Print Archive